An introduction to using Google’s BigQuery with R
Truly big data
Need a tool that helps you store, manage, and work with these data
Don’t want to learn a whole new language
What is Google’s BigQuery?
Serverless data warehouse
Built-in query engine
bigrqueryR package that allows you to work with data stored in BigQuery through R.
To install:
To load into your current session:
BigQuery is hierarchical:
Tables are stored in:
Datasets, which are stored in:
We will step through uploading your data to BigQuery
bigrqueryList useful information:
bigrqueryFirst, check that the table does not already exist:
bigrqueryNext, create the (empty) table:
[1] TRUE
bigrqueryNext, upload your data to that empty table:
We will now step through how to work with big data out of your computer’s memory
dplyrLet’s collect data on Australia’s GDP.
dplyrThese data are stored in the trade-dependence project and the country_annual_information dataset.
selected_project <- "trade-dependence"
selected_dataset <- "country_annual_information"
con <- dbConnect(
bigrquery::bigquery(),
project = selected_project,
dataset = selected_dataset,
billing = selected_project
)
con<BigQueryConnection>
Dataset: trade-dependence.country_annual_information
Billing: trade-dependence
dplyrCreate the connection to the reporter_gdp table:
# Source: table<reporter_gdp> [?? x 3]
# Database: BigQueryConnection
year reporter_code reporter_gdp_current
<date> <int> <dbl>
1 2003-01-01 92 NA
2 2003-01-01 136 NA
3 2003-01-01 531 NA
4 2003-01-01 292 NA
5 2003-01-01 408 NA
6 2003-01-01 NA NA
7 2003-01-01 520 NA
8 2003-01-01 534 NA
9 2003-01-01 706 NA
10 2003-01-01 728 NA
# ℹ more rows
dplyrQuery that table:
# A tibble: 23 × 3
year reporter_code reporter_gdp_current
<date> <dbl> <dbl>
1 2002-01-01 36 3.96e11
2 2018-01-01 36 1.43e12
3 2009-01-01 36 9.29e11
4 2011-01-01 36 1.40e12
5 2004-01-01 36 6.14e11
6 2021-01-01 36 1.55e12
7 2017-01-01 36 1.33e12
8 2008-01-01 36 1.06e12
9 2001-01-01 36 3.79e11
10 2012-01-01 36 1.55e12
# ℹ 13 more rows
R will write your SQL queries for you:
You can perform that query in BigQuery’s in-built query engine:
You can perform that query in BigQuery’s in-built query engine: